g03adf
g03adf
© Numerical Algorithms Group, 2002.
Purpose
G03ADF Performs canonical correlation analysis
Synopsis
[e,ncv,cvx,cvy,ifail] = g03adf(z,isz<,wt,tol,weight,ifail>)
Description
Let there be two sets of variables, x and y. For a sample of n
observations on n variables in a data matrix X and n variables
x y
in a data matrix Y, canonical correlation analysis seeks to find
a small number of linear combinations of each set of variables in
order to explain or summarise the relationships between them. The
variables thus formed are known as canonical variates.
Let the variance-covariance of the two data sets be
(S S )
( xx xy)
(S S )
( yx yy)
and let
-1 -1
(Sigma)=S S S S
yy yx xx xy
then the canonical correlations can be calculated from the
eigenvalues of the matrix (Sigma). However, G03ADF calculates the
canonical correlations by means of a singular value decomposition
(SVD) of a matrix V. If the rank of the data matrix X is k and
x
the rank of the data matrix Y is k and both X and Y have had
y
variable (column) means subtracted then the k by k matrix V is
x y
given by:
T
V=Q Q ,
x y
where Q is the first k rows of the orthogonal matrix Q either
x x
from the QR decompostion of X if X is of full column rank, i.e.,
k =n :
x x
X=Q R
x x
or from the SVD of X if k <n :
x x
T
X=Q D P
x x x
Similarly Q is the first k rows of the orthogonal matrix Q
y y
either from the QR decompostion of Y if Y is of full column rank,
i.e., k =n :
y y
Y=Q R
y y
or from the SVD of Y if k <n :
y y
T
Y=Q D P .
y y y
Let the SVD of V be:
T
V=U (Delta)U
x y
then the non-zero elements of the diagonal matrix (Delta),
(delta) , for i=1,2,...,l, are the l canonical correlations
i
associated with the l canonical variates, where l=min(k ,k ).
x y
2
The eigenvalues, (lambda) , of the matrix (Sigma) are given by:
i
2
(delta)
2 i
(lambda) = ----------.
i 2
1+(delta)
i
2 -- 2
The value of (pi) =(lambda) / > (lambda) gives the proportion of
i i -- i
variation explained by the ith canonical variate. The values of
the (pi) 's give an indication as to how many canonical variates
i
are needed to adequately describe the data, i.e., the
dimensionality of the problem.
To test for a significant dimensionality greater than i the
2
(chi) statistic:
p
1 -- 2
(n- -(k +k )) > log(1+(lambda) )
2 x y -- i
j=i+1
2
can be used. This is asymptotically distributed as a (chi)
distribution with (k -i)(k -i) degrees of freedom. If the test
x y
for i=k is not significant, then the remaining tests for
min
i>k should be ignored.
min
The loadings for the canonical variates are calculated from the
matrices U and U respectively. These matrices are scaled so
x y
that the canonical variates have unit variance.
Parameters
g03adf
Required Input Arguments:
z (:,:) real
isz (:) integer
Optional Input Arguments: <Default>
wt (:) real zeros(size(z,1),1)
tol real sqrt(eps)
weight (1) string 'u'
ifail integer -1
Output Arguments:
e (:,6) real
ncv integer
cvx (:,:) real
cvy (:,:) real
ifail integer